Reproducible analyses with targets and Docker: An example from reprohack

Joel H. Nitta

Reprohack
2022-10-11


https://joelnitta.github.io/reprohack_2022-10-11

QR code linking to https://joelnitta.github.io/reprohack_2022-10-11

targets logo

docker logo

Self-introduction

@joel_nitta

https://joelnitta.com

  • Project Research Associate @ Tokyo University

  • Research interests: Ecology and evolution of ferns

  • Hobbies: Running (after my 7YO!)

Photo of Joel Nitta collecting fern gametophytes by J-Y Meyer

Photo: J-Y Meyer

Outline of today’s talk

  • Introduction to {targets} and Docker

  • Demo: Pleurosoriopsis project

  • Q & A

Why {targets} and Docker?

  • To make your code more reproducible

What is reproducibility?

The ability for others (or your future self) to re-run your code and get the same results

  • Not “yes” or “no”
  • Reproducibility = mindset
  • Many aspects
    • Data availability
    • Code automation
    • Computing environment

Analysis workflows

Which steps need to be run in what order?

Analysis workflows

If one part of a workflow changes, how does it affect other parts?

How much of it do we need to re-run?

What is {targets}?

  • R package to automate workflow

  • Only runs necessary steps

  • Can run workflow steps in parallel (speeds up analysis)

  • Provides proof that results are derived from code

targets package logo

{targets} resources

What is Docker?

Docker provides containers: self-contained packages of software with everything needed to run an application

Docker logo

Why use Docker?

  • A given piece of software depends on many other pieces of software (dependencies)

  • If dependencies change (or are missing), your program may not run

  • Docker containers provide all dependencies needed, on any computer

'It works on my machine' meme

Docker resources

Example: Pleurosoriopsis

  • Study on growth conditions of fern Pleurosoriopsis makinoi in Japan

  • Original reprohack used {drake} instead of {targets} but {targets} is newer and replaces {drake}, so I recommend that

  • Available at targets branch of pleurosoriopsis repo:

https://github.com/joelnitta/pleurosoriopsis/tree/targets

Image of fern Pleurosoriopsis makinoi

Demo

  • Only required software is Docker
  • Can replicate analysis with one command
docker run --rm -v ${PWD}:/wd -w /wd joelnitta/pleurosoriopsis:targets \
  bash /tmp/make.sh

Interacting with the code

  • The Docker image comes with RStudio so we can interact with the code in Docker

(must be run from within pleurosoriopsis folder)

docker run --rm -dt -v ${PWD}:/home/rstudio/pleurosoriopsis -p 8787:8787 \
  --name pleuro \
  -e DISABLE_AUTH=true \
  joelnitta/pleurosoriopsis:targets
  • Navigate to localhost:8787/ in your browser
  • In the RStudio “Files” pane, click the pleurosoriopsis folder
  • Click Pleurosoriopsis.Rproj to open the project.

Cleaning up

  • Remove the container:
docker kill pleuro
  • (Optional) remove the image:
docker rmi joelnitta/pleurosoriopsis:targets

Questions?